Library Imports

from pyspark.sql import SparkSession
from pyspark.sql import functions as F

from datetime import date

Template

spark = (
    SparkSession.builder
    .master("local")
    .appName("Section 2.4 - Constant Values")
    .config("spark.some.config.option", "some-value")
    .getOrCreate()
)

sc = spark.sparkContext

import os

data_path = "/data/pets.csv"
base_path = os.path.dirname(os.getcwd())
path = base_path + data_path

pets = spark.read.csv(path, header=True)
pets.toPandas()

	id	breed_id	nickname	birthday	age	color
0	1	1	King	2014-11-22 12:30:31	5	brown
1	2	3	Argus	2016-11-22 10:05:10	10	None
2	3	1	Chewie	2016-11-22 10:05:10	15	None

Constant Values

There are many instances where you will need to create a column expression or use a constant value to perform some of the spark transformations. We'll explore some of these.

Case 1: Creating a Column with a constant value (`withColumn()`) (wrong)

pets.withColumn('todays_date', date.today()).toPandas()

---------------------------------------------------------------------------

AssertionError                            Traceback (most recent call last)

<ipython-input-4-f87e239cb534> in <module>()
----> 1 pets.withColumn('todays_date', date.today()).toPandas()


/usr/local/lib/python2.7/site-packages/pyspark/sql/dataframe.pyc in withColumn(self, colName, col)
   1846 
   1847         """
-> 1848         assert isinstance(col, Column), "col should be Column"
   1849         return DataFrame(self._jdf.withColumn(colName, col._jc), self.sql_ctx)
   1850 


AssertionError: col should be Column

What Happened?

Spark functions that have a col as an argument will usually require you to pass in a Column expression. As seen in the previous section, withColumn() worked fine when we gave it a column from the current df. But this isn't the case when we want set a column to a constant value.

If you get an AssertionError: col should be Column that is usually the case, we'll look into how to fix this.

Case 1: Creating a Column with a constant value (`withColumn()`) (correct)

pets.withColumn('todays_date', F.lit(date.today())).toPandas()

	id	breed_id	nickname	birthday	age	color	todays_date
0	1	1	King	2014-11-22 12:30:31	5	brown	2019-02-14
1	2	3	Argus	2016-11-22 10:05:10	10	None	2019-02-14
2	3	1	Chewie	2016-11-22 10:05:10	15	None	2019-02-14

What Happened?

With F.lit() you can create a column expression that you can now assign to a new column in your dataframe.

More Examples

(
    pets
    .withColumn('age_greater_than_5', F.col("age") > 5)
    .withColumn('height', F.lit(150))
    .where(F.col('breed_id') == 1)
    .where(F.col('breed_id') == F.lit(1))
    .toPandas()
)

	id	breed_id	nickname	birthday	age	color	age_greater_than_5	height
0	1	1	King	2014-11-22 12:30:31	5	brown	False	150
1	3	1	Chewie	2016-11-22 10:05:10	15	None	True	150

What Happened?

(We will look into equilities statements later.)

The above contains constant values (column height) and column expressions (columns using F.col()) so a F.lit() is not required.

Summary

You need to use F.lit() to assign constant values to columns.
Equality expressions with F.col() is also another way to have a column expressions.
When in doubt, always use column expressions F.lit().

Section 2.4 - Constant Values and Column Expressions

Library Imports

Template

Constant Values

Case 1: Creating a Column with a constant value (`withColumn()`) (wrong)

Case 1: Creating a Column with a constant value (`withColumn()`) (correct)

More Examples

Summary

results matching ""

No results matching ""

Library Imports

Template

Constant Values

Case 1: Creating a Column with a constant value (withColumn()) (wrong)

Case 1: Creating a Column with a constant value (withColumn()) (correct)

More Examples

Summary

results matching ""

No results matching ""

Case 1: Creating a Column with a constant value (`withColumn()`) (wrong)

Case 1: Creating a Column with a constant value (`withColumn()`) (correct)